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Gridge  approximation  and  Radon  compass 

V.E.  Maiorov,  K.I.  Oskolkov  and  V.N.  Temlyakov 


Abstract 

Gridge  approximation  compiles  greedy  algorithms  and  ridge 
approximation.  It  is  a  class  of  algorithmic  constructions  of  ridge  func¬ 
tions  -  finite  linear  combinations  of  planar  waves.  The  goal  is  to 
approximate  a  given  target  which  is  a  multivariate  function.  On  each 
step,  a  new  planar  wave  is  added  to  the  preceeding  linear  combination. 
This  wave  is  selected  greedily ,  i.  e.  optimally  with  regard  to  both  the 
direction  of  propagation  and  the  profile.  In  Mathematical  Statistics, 
gridge  approximation  is  known  as  projection  pursuit  regression.  We 
consider  gridge  approximation  in  weighted  Hilbert  functional  spaces 
on  d-dimensional  Euclidean  space. 

The  notion  of  Radon  compass  is  introduced,  which  is  a  tool  of 
search  of  the  optimal  direction  of  propagation  on  each  step  of  the 
algorithm. 

The  main  quantitative  result  concerns  error  estimates  for  gridge 
processes  in  the  norm  of  Hilbert  space  of  functions  supported  on  the 
unit  ball,  with  regard  to  Lebesgue  measure.  Fourier  analysis  of  Radon 
transformation,  in  terms  of  Chebyshev  -  Gegenbauer  polynomials, 
provides  the  crucial  tool  in  such  case. 

For  a  rather  wide  class  of  target  functions  whose  polynomial  ap¬ 
proximations  do  not  decrease  “too  rapidly”,  gridge  approximation  is 
equally  efficient  as  classical  algebraic  polynomial.  In  particular,  gridge 
approximation  is  not  order-saturated. 
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0.1  Ridge  and  polynomial  approximation.  Greedy  al¬ 
gorithm 

Let  IRf/,  d  =  1,2,...  denote  the  real  d-dimensional  Euclidean  space  of  vectors 

x  =  (xlrx2,...,xd),x-y  ■=x1yl  + - t-  xdyd,  |x|  :=  Vx  '  x;  further,  Bd  := 

{x  :  |x|  <  1}  and  <Sd_1  :=  {x  :  |x|  =  1}  the  unit  ball  and  the  unit 

sphere  in  IRd;  |  Bd  | ,  |»Sd_1  |  the  volume  of  Bd  and  the  surface  area  of 

<Sd_1;  n(dx)  :=  |^|  ,  y(d0)  normalized  Lebesgue  measures  on  Bd 

and  Sd~l,  respectively.  Let  us  fix  d,  for  breviety  denote  B  :=  Bd,  S  :=  <Sd_1 
and  in  the  usual  fashion  introduce  the  Hilbert  space  C2{B)  of  functions  /(x) 
supported  in  B: 


C2(B)  :=  j/(x)  :  ||/||  =  ||/,  C2(B) I  :=  ^  l/(x)PMrfx)  <  °°|  • 


For  M  >  0,  a  natural  N  and  /(x)  G  C2(B)  let 


VdM  :=  Span  { 


rpk  1  k2 
X1  x2 


N 


X 


kd 


}  .  Em [/]  min  ||/  P  b 

>  fcl+fc2+-+fcd<M  U  1  p<z-pd,M  ''  " 


nN  :=  {  S{x)  =  E  Wj(x  •  e3)  }  ,  aN[f, n\  =  aN[f ]  :=  \\f  -  5||. 


j= 1 


Vd,M  is  the  subspace  of  algebraic  polynomials  of  degree  <  M,  in  d  variables. 
The  quantity  EM  [/]  is  the  classical  best  M-th  degree  polynomial  approxima¬ 
tion  of  /. 

In  the  definition  of  the  set  7 ZN ,  Wj(x)  are  arbitrary  single- variate  func¬ 
tions,  and  6j  arbitrary  (unit)  vectors.  This  set  consists  of  all  jY-term  linear 
combinations  S(x)  of  functions  of  the  type  planar  wave.  In  the  sequel,  we 
call  ,S'(x)  G  7 ZN  ridge  functions  of  N- th  order;  the  quantity  crN[f]  is  known 
as  best  free  ridge  approximation  of  /  in  C2{B). 

In  the  modern  terminology,  the  set  of  all  planar  waves  constitutes  the 
dictionary  7 Z.  The  quantity  ct/v[/,  7Z]  characterizes  the  best  non-linear  N- 
term  approximation  with  regard  to  7 Z,  see  [1].  Here,  wave  profiles  W3(x)  and 

91-J-1076,  ONR/ARO  DEPSCoR  Grant  DAAG55-98-1-0002; 

V.N.Temlyakov:  National  Science  Foundation  Grant  DMS-9970326  and  ONR  Grant 
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wave  vectors  (directions  of  propagation)  Gj  G  S,  are  subjects  of  optimization. 
This  problem  is  indeed  rather  non-linear  with  respect  to  optimization  of  wave 
vectors. 

Let  us  note  that  the  existence  of  the  best  TV-term  linear  combination  of 
planar  waves  for  N  >  2  is  non-trivial,  due  to  the  possible  collapse  effect  of 
wave  vectors.  For  more  details,  see  [8]  [10]. 

The  gridge  approximation  process  also  generates  ridge  functions 


N—l 

S(x)  =  GN[f,x]  =  Y  Wj  (x  •  Gj) , 
j= o 

but  now  they  are  constructed  step-wise,  algorithmically. 

By  definition,  the  non-constrained  version  of  such  process  (without  re¬ 
strictions  on  the  wave  profiles  W (rr)),  is  described  by  the  following  iterations: 

Go[f,  x]  :=  0,  /jv(x)  :=  /(x)  -  GN[f,x\] 

0 GN ,  WN(x))  :=  arg  min  min  ||/at(x)  -  W(x  ■  0)|| , 

1  {W(x)} 

GN±i[f,x\  GN[f,x]  +  WN  (x  •  Gn)  ,  N  =  0,1,...  (1) 

This  algorithm  (projection  pursuit)  was  proposed  in  [4]. 

Clearly,  gridge  approximation  may  be  a  branching  process,  due  to  non¬ 
uniqueness  of  arg  min  in  G  on  a  certain  step.  In  this  case,  we  do  not  make 
any  preferences  among  optimal  wave  vectors.  We  fix  a  branch  of  the  gridge 
process  and  measure  its  effectiveness  by  Tl%[f]  :=  ||/jv||- 
Our  primary  goal  is  the  following  statement. 

Theorem  1  Assume  that  the  sequence  of  best  polynomial  approximations  of 
a  function  /(x)  satisfies  the  estimate  EM[f  ]  —  O  (FWLf]) ,  M  — >  oo.  Then 

RS[/]  =  o(bn3it[/]),  (2) 

for  any  branch  of  the  gridge  approximation  process. 

Before  turning  to  the  proof,  let  us  recall  some  related  facts  from  the 
theory  of  greedy  algorithms. 

The  problem  of  gridge  approximation  is  a  particular  case  of  a  general 
setting,  see  [2]  and  [1]  where  an  arbitrary  (normalized)  dictionary  V  was 
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considered,  and  efficiency  of  pure  greedy  algorithm  with  regard  to  T>  studied. 
A  set  V  of  elements  from  a  Hilbert  space  H  is  called  a  dictionary  if  each 
g  £  V  has  norm  one  (||<?||  =  1),  and  SpanD  =  H.  For  f  £  H,  we  denote 
g(f)  argmax9Gx>  |(/  •  g)\  one  of  the  elements  from  V  which  maximizes  the 
absolute  value  of  the  inner  product  (we  make  an  additional  assumption  that 
such  maximizer  exists).  Let 

G[f]  =  G[f  |  V]  :=  {/  ■  g(f))g(f);  R[f }  =  R[f\  !>]:=/-  G[/]. 

Then  the  pure  greedy  algorithm  (PGA)  with  regard  to  the  dictionary  V  is 
inductively  defined  by  the  relations:  Rq [/]  ■=  f,  Go [/]  :=  0  and  for  m  >  1 

Gm[f\  ■=  Gm-l[f]  +  G  [Rra-llf]]  i  *W/]  ■=  f  ~  Gm[f }  R  [Rm-l[f]\  ' 

There  are  some  general  results  on  efficiency  of  PGA  in  the  case  of  a  re¬ 
dundant  dictionary,  see  [2].  Redundancy  means  that  V  is  not  a  minimal 
system.  The  set  1Z  of  all  (normalized  in  C2(B))  ridge  functions  constitutes 
such  a  redundant  dictionary.  Another  classical  example  of  redundant  dic¬ 
tionary  for  Hilbert  space  C2  ([0,  l]2)  is  the  set  n  :=  C2  ([0, 1])  x  C2  ([0, 1]), 
with  normalization  in  £2  ([0,  l]2).  In  the  latter  case,  an  m- term  approximant 
looks  as  follows: 

m 

12  cjuj(x^)vj(x2- 

m 

A  pioneering  work  in  this  direction  was  done  by  E.  Schmidt  [12].  It  was 
understood  later  that  in  the  case  of  the  dictionary  n,  the  pure  greedy  algo¬ 
rithm  always,  i.  e.  for  each  function  f  £  C2  ([0,  l]2),  realizes  the  best  m-term 
approximation  with  regard  to  n.  This  is  a  strong  argument  in  favor  of  PGA 
in  nonlinear  m-term  approximation.  However,  [2]  contains  an  example  of  a 
redundant  dictionary  V  that  is  an  orthonormal  basis  {hj}JL g  with  one  extra 
element  added.  For  this  dictionary  and  /  =  hi  +  h2  one  has 

11/  -  Gm[f,V]\\  >  cm~1/2 

where  c  is  a  positive  constant.  This  means  that  in  general,  PGA  has  a 
saturation  property.  On  the  contrary,  theorem  1  above  shows  that  in  the 
case  of  the  dictionary  V ,  the  PGA  is  not  saturated. 

Before  turning  to  the  proof,  let  us  also  comment  on  relations  between 
EM[f],  aN[f }  and  7 
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First  of  all,  obviously  <tjv[/]  <  E%  [/]  ■ 

Second,  there  are  no  a  priori  restrictions  on  the  profiles  of  the  waves  in  the 
setting  of  ridge  approximation  problem.  Thus,  one  has  to  be  sure  that  the 
terms  of  best  polynomial  approximations  EM  [/]  are  natural  in  the  estimates 
of  efficiency  of  ridge  approximations. 

To  justify  this,  one  needs  to  exhibit  sufficiently  wide  classes  of  functions 
/(x)  where  the  corresponding  lower  estimates  are  typical.  Here,  partial  an¬ 
swers  were  obtained  in  [8],  [9]  and  [6],  [13].  For  d  =  2  and  for  each  radial 
function ,  i.  e.  /(x)  =  /(|x|)  the  following  estimates  hold  true,  cf.  [8] 

°jv[/]  >  g-EkvLf],  N  =  1,2,...  . 

In  [6],  [13]  Nikol’skii  Sobolev  type  spaces  Hr  —  Hr(C2(B))  were  considered. 
For  a  fixed  r  >  0,  let  us  define  Hr  as  the  collection  of  all  functions  /(x)  whose 
polynomial  approximations  satisfy  the  estimate  Em [/]  =  0(M~r),  M  — »  oo. 
Then  (cf.  [6])  there  exists  a  function  /(x)  G  Hr  such  that 

0jv[/]  >  N  d_1  i  N  —  1,2,... 

Third,  the  upper  estimates 


<Tv[/]  <  EN-i[f],  d  =  2;  aN[f]  <  E  i  [/],  d  >  3,  c  =  cd  >  0,  (3) 

CN  a-1 

for  free  ridge  approximations  are  true,  without  any  restrictions  concerning 
the  order  of  decay  of  the  sequence  {F^m[/]}. 

Indeed,  let  HN  fj  V1,M  denote  the  subset  of  7 ZN  consisting  of  linear  com¬ 
binations  of  planar  wave  polynomials  of  degree  <  M: 


nNf]V^M  :=  |p(x)  =  E  P,(x  •  G3),  Pj(x)  G  Vl’M 

Obviously,  C  Vd,M .  On  the  other  hand,  it  is  also  known,  see  e. 

g.  [11],  that 

vd,M  _  nNM  p|  vi where  jyM  _  q  ;  M  ->•  oo, 

that  is,  every  polynomial  -P(x)  G  Vd,M  can  be  represented  as  a  ridge  poly¬ 
nomial  of  order  NM  and  degree  M,  and  the  number  NM  of  planar  wave 
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polynomials  required  for  the  representation  satisfies  the  indicated  estimate  2 
from  above.  In  the  particular  case  of  d  =  2,  see  e.g.  [5],  Nm  <  M  + 1.  These 
purely  algebraic  facts  imply  estimates  (3). 

We  will  also  prove  an  analogue  of  Theorem  1  concerning  rates  of  gridge 
polynomial  approximation.  The  corresponding  algorithm  is  defined  by  the 
relations: 

Gool(x)  :=  0;  Rn(x)  :=  f(x)  -  G^ol[/,x], 

(0N,  pn(x))  :=  arg  min  II  (/(x)  -  G^ol(x))  -  p(x  ■  G)  ; 

G^+i(x)  :=  GPjf\x)  +Pat(x-  0n),  N  =  0,1,....  (4) 

Such  algorithm  is,  in  general,  also  branching,  because  of  non-uniqueness  of 
the  minimizer  in  the  wave  vector  0.  If  we  follow  any  of  such  branches,  after 
N  >  1  steps  the  resulting  approximant  G^f\x)  of  /(x)  will  be  a  sum  of 
planar  wave  polynomials, 

N-l 

G^ol(x)  =  Pi  (x  '  6i)  >  Pj(x)  e 

3=  0 

so  that  G^ol(x)  G  nN 

Since  the  profiles  of  the  waves  are  single- variate  algebraic  polynomials, 
this  algorithm  may  represent  a  bigger  interest  in  applications. 

For  a  fixed  branch,  let  7 Z^[f]  ||-Rvr||. 

Theorem  2  Assume  that  the  sequence  of  best  polynomial  approximations  of 
a  function  /(x)  satisfies  the  estimate  Em  [/]  =  O  (£^2m[/D  ,  M  oo.  Then 

^“[/]  =  O  (enA  [/])  ,  00,  (5) 

for  any  branch  of  the  gridge  polynomial  approximation  process. 

Let  us  outline  the  subsequent  contents  of  the  paper. 

In  the  next  section,  Radon  compass  C[f,0],  0  G  S,  is  introduced.  For 
/  G  £2(£>),  C[f,  0]  is  a  non-negative  continuous  function  on  the  sphere  S, 
and  the  maximizer  (s) 

0N  :=  arg  max  C  [fN,0] ,  fN  :=  f  -  GN[f],  (6) 

0<es 

2The  exact  values  of  the  numbers  v{M1d)  :=  min  {TV  :  Vd'M  —  1ZN  seem  to 

be  unknown. 
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indicates  the  optimal  wave  vector  on  the  iV-th  step  of  the  gridge  approxima¬ 
tion.  The  definition  of  Radon  compass  is  rather  geometrical.  It  is  applicable 
to  a  wider  class  of  gridge  approximation  processes  in  Hilbert  spaces  than 
just  C2{B).  However,  in  the  case  of  C2(B)  the  sequence  {C  [f^,  0]}^=o  is 
uniformly  continuous  on  S  (cf.  lemma  3  below).  The  latter  property  seems 
important  in  applications. 

After  it  we  address  to  Fourier  Chebyshev  analysis  in  C2(B)  which  is 
crucial  in  the  error  estimates.  Basing  on  the  corresponding  Parceval  identi¬ 
ties,  we  derive  the  important  property  of  polynomial  shrinkage.  The  essence 
is  that  optimization  in  profiles  “  improves  £2(B)-smoothness”  along  the  se¬ 
quences  /at,  Rjst-  This  property  is  expressed  by  monotony  relations  of  the 
type  Em  [/.y  m;]  <  Em  [/tv]  ,  N  =  0, 1, . . ..  In  fact,  using  shrinkage  on  each 
step,  we  substitute  exact  maximization  of  Radon  compass  by  averaging  over 
the  sphere:  max^£S  C  [fjy,  6\  — »  fs  C  [fjy,  6]  p(dO).  In  the  other  words, 
the  strict  optimization  of  the  couple  (G,W(x))  is  replaced  by  a  “partially 
stochastic”,  as  follows: 

W(x-0)\\2  — t  /  min  ||/-  W (x  ■  0)\\2  p(d0) . 

Ges  {w(x)}  Js  {w(x)} 

In  the  end  we  reduce  the  estimation  problem  to  that  of  convergence  rates 
of  the  sequence  of  truncated  itegrals,  of  the  type 

<Tv+i  <  /  min  (ajv,e(f))  <*£,  A  =  0,1,..., 

Jo 

where  e(^)  is  a  positive  integrable  function. 


0.2  Optimization  of  profiles.  Radon  compass 

Let  co(x),  x  G  !Rd  be  an  integrable  and,  for  simplicity  sake,  radial  weight 
function  i.  e.  u?(x)  =  n?(|x|)  >  0,  fMd  o;(|x|)dx  =  |<Sd_1|/o°°  xd~1u>(x)  dx  < 
oo.  Denote  £2  the  Hilbert  space  of  functions  /(x)  square-integrable 

with  regard  to  u>: 


C1 
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For  a  fixed  wave  vector  9  and  a  function  /  G  £2  (iR^,  let  us  consider 
the  problem  of  best  approximation  of  /  by  planar  waves  propagating  in  the 
direction  of  9: 


f-W(x-0),C2u(l Rd) 


min  in  profiles  W{x). 


(7) 


It  is  not  hard  to  solve  this  problem  using  the  direct  Radon  transforma¬ 
tion.  Namely,  for  a  function  g(x)  G  C1  (lRd)  and  9  G  Sd~l,  x  G  IR1,  let 
Rad[g;  9,  x]  :=  fx.g  <?(x)  dx!  where  the  integration  is  taken  with  respect  to 
(d  —  l)-dimensional  Lebesgue  measure  on  the  hyperplane  x  •  9  =  x. 

Note  that  the  Radon  transform  Rad[o>;  9,  x\  of  the  weight  does  not  depend 
on  9  because  u;(x)  =  c*j(|x|),  and  in  fact 


Rad[u;;  9,x]  =  j  ^  i  d  yjx?  +  x%  +  •  •  •  +  a^-ij  ^xi  •  • -  dx^-i 

=  |<Sd_2|  J  td~2u>  ( \fx 2  +  f2)  dt 

=  — — ^ - -  fx2  -  x2)^  d£  :=  wu4(x). 

The  weight  a;(x)  =  x  G  Bd\  cn(x)  =  0,  |x|  >  1  (normalized  characteristic 
function  of  13d),  corresponds  to  the  space  C2(B).  For  this  particular  weight, 
we  have 

I#**-1 1  dni 

Wd(x)  =  |gd|  (1  -  X2)+2  ,  rr+  :=  max(r,  0).  (8) 

Lemma  1  The  optimal  profile  W(9,x )  :=  arg  min.^^)}  in  the  problem  (7) 
is  defined  for  x  G  Bu  supp  wWjd  by 


W(9,x)  =  Su[f;9,x] 


R&d[fcu;9,x]  Rad[fuj]9,x 

Radfcu;  9,  x]  wWyd(x) 


(9) 


and 


mm 

{W(x)} 


f~W(x ■  9),  Cl  (lRd)f  =  |  f,Cl  (lRd)  || 2  -  CM,  9], 

CM,0]  :=  •  9), Cl  (lRd)||2  =  f  |£u,[/;0,z]|2  wUtd(x)  dx(10) 

J 
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Indeed,  if  U ( x )  is  a  single- variate  function  such  that  the  corresponding  planar 
wave  U(x  •  0)  belongs  to  (jRd),  we  have 

fUd  (/(x)  -  w(0, x  •  0))  U(x-0)  u(x)dx 

=  JB  (£  0  ^  (/(x)  -  W (0,  a;))  U {x)  cv(x)dx'^j  dx 

—  f  (Ra,d[fu)-,d,x]—Rad[uj]O,x\£lJj[f',O,x])U(x)dx  =  0 

JBu 

which  means  that  the  difference  /(x)  —  W(0,  x  •  6)  is  orthogonal  in  (lRdj 
to  the  subspace  of  all  planar  waves  propagating  in  the  direction  6.  This 
proves  the  extremal  property  of  the  profile  function  (9)  in  the  problem  (7), 
and  (10)  also  easily  follows. 

We  call  the  function  Cu[f,0],  6  E  <Sd_1,  Radon  compass.  It  is  non¬ 
negative,  and  as  it  is  not  hard  to  see,  continuous  on  Sd~ 1 . 

The  special  role  of  Radon  compass  is  seen  from  (10): 

min  f  -W(x-  0),  Cl  (lRd)  2  =  /,  £„  (lRd)  2  -  max  Cu[f,  0]. 
fleSd-MWOr)}  V  7  V  7  0€Sd~1 

(11) 

The  maximizer  0 ^  :=  argmax^^.!  Cw[f,  0]  indicates  the  optimal  direction 
on  a  typical  step  of  the  algorithm  (1).  In  loose  words,  Radon  compass  serves 
as  a  navigational  tool  in  gridge  approximation  process: 

Go[f,x\  —  0;  /at  =  /  —  GN[f] 

0 N  =  arg  max  C  [ fN ;  0] ,  WN(x)  =  8  [fN]  0N,  x] , 

0es 

Gw+i[/.  x]  =  Gjv[/,  x]  +  Wjv  (x  •  0tv)  ,  N  =  0,1,...  .  (12) 

Appropriate  versions  of  Radon  compass  can  be  also  constructed  for  gridge 
processes  with  constrained  profiles.  In  the  special  case  of  polynomial  gridge 
approximation  (4) ,  this  construction  is  based  on  best  algebraic  approximation 
of  S[f]0,x\,  cf.  (4),  (8),  (9): 

QN[f)0,x]:=  arg  min  /  \£  [/;  0,  x]  -  p(x)\2  wd(x)  dx  , 

p(x)^v1’n  J- 1 

CN[f;0]  :=/1  \QN[f]0,x]\2  wd(x)dx, 

0 N  =  arg  max  CN[RN;  0],  pN(x )  =  QN  [RN]  0N,x]  .  (13) 

0i  sd  1 
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0.3  Fourier  —  Chebyshev  analysis.  Shrinkage 

The  quantitative  results  of  this  paper  concern  only  approximation  in  the 
metric  of  the  Hilbert  space  C2(B).  It  should  be  noted  that  proper  analogs  for 
approximation  in  metrics  different  from  jC2(B)  are  not  known.  Such  analogs 
represent  an  interesting  circle  of  open  problems,  even  for  weighted  Hilbert 
spaces  (lRdj ;  a  particular  example  is  Cff  with  the  Gauss’  weight 

Ci?(x)  =  e_7rlxl2. 

From  now  on,  for  the  sake  of  breviety,  we  will  apply  the  notations 
B  =  Bd ,  ll/ll  =  ||/,£2(B)||,  5  =  Sd~\  ||a||s  :=  Jf  \a(9W 

w(x)  :=  -  x2)7~,  IIVIU  :=  \V(x)\2w(x)dx, 

{^(a;)}^0  =  {^(a;)}^0  the  system  of  Chebyshev  Gegenbauer  polynomials 
orthonormal  on  (—1, 1)  with  the  weight  w(x),  see  (8). 

Let  us  temporarily  put  on  hold  the  optimization  of  wave  vectors  in  the 
definitions  of  algorithms  (1)  and  (4).  Instead,  let  us  fix  an  arbitrary  sequence 
©  =  {#jv}o°  C  S  and  consider  the  profile- greedy  process,  in  which  only 
profiles  of  the  waves  are  optimized  on  each  step.  Such  a  process  consists  in 
the  iterations:  :=  / 

WN(x)  :=  argrnin^  \\fN  -W(^-0N)\\  ; 
pN(x)  :=  argminpePi,jv  \\RN  -p(x  •  0jv)||  ; 

,/.v  i  :=  /at  _  kFjy(x  •  On),  Rn+ i  :=  Rn  ~  Pw(x  •  On)-  (14) 

Theorem  3  For  every  sequence  of  wave  vectors  0,  the  profile- greedy  pro¬ 
cesses  (14)  are  polynomial  shrinkages:  the  matrices  of  best  approximations 
are  double-monotone  3 

En  f/.v  •  1]  <  En  [/at]  ,  En  [i?jv]  <  En  [i?jv]  •  (15) 

This  statement  is  a  part  of  the  proof  of  Theorems  1  and  2.  We  prove  it 
here,  using  Chebyshev  Fourier  expansion  4  and  the  corresponding  Parceval 

3 The  inequalities  En+1  [/jv]  <  En  [  fN ] ,  En+1  [/?,v]  <  En  [i?,v]  are  trivial. 

4In  fact,  this  expansion  represents  the  operator  of  inverse  Radon  transformation 
ft_1[w-]  restricted  on  the  functions  supported  in  B.  Polynomials  un (x )  are  eigen-functions 
of  this  operator,  and  Xn  -  the  multiples  of  the  eigen- values:  \Bd\R~1[wun]{x)  =  \nun(x). 
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identity,  cf.  e.g.  [11] 


On[f,0\  ■=  JB  /(x)wn(x  •  9)n(dx),  Xn=i^  +  dn  1j’ 

/(x)  C-]  Js  A ,nan[f,0]un(x-  0)^J  / J,(d0 ), 

OO  „  OO 

ll/ll2  =  £  An  /  l«n[/,^]|2  =  £  ^n||On[/]|||-  (16) 

n=0  '/<S  n=0 

The  operator  of  orthogonal  projection  PM  [/,  x]  in  £2(£>)  onto  the  sub¬ 
space  of  algebraic  polynomials  Vd'M  and  the  values  Em  [/]  of  best  polynomial 
approximation  are  given  by 


PM[f,  x  = 


-i.(  s 


A„a„[/,0]nn(x- 0)  n(dd), 


OO 

£*[/]  =  11/ --Pm[/]II  =  , 

£  AnK[/]||i 

n=M+ 1 

(17) 


The  coefficient  a„[/,  0]  in  the  expansion  (16)  is  called  n-th  Chebyshev 
momentum  of  /. 

For  profile-greedy  processes  (14),  all  Chebyshev  momenta  are  shrinking 
in  £2(<S),  which  can  be  seen  from  the  following  statement. 

Lemma  2  For  n,  N  —  0, 1, . . . 


||®n  [/iV+lJII^  ^  ||an  [/ivjll^  ^  || an  [/lll^  i 

Ikn  [-RwlHs  <  ||<bi  [-R/vjHs  <  Ilan[/]||5  •  (18) 

Indeed,  as  a  function  of  0  G  Sd~ 1 ,  an  [/,  6}  is  a  spherical  polynomial  of 
degree  n,  satisfying  an[f,—0]  =  (— 1  )nan[f,0\.  Let  us  denote  =  T±d 
the  subspace  of  all  spherical  polynomials  with  this  property,  and  denote 
Kn(x)  —  Kn,d(x )  the  Dirichlet  kernel  for  ■  This  kernel  is  the  unique 
algebraic  polynomial  of  degree  n  that  satisfies  Kn(—x)  =  (— 1  )nKn(x)  and 
represents  the  identity  operator  on  by  convolution  on  the  sphere  S: 

a(<P)  =  [  a(6)Kn(0  ■  (p)fj,(d0),  Va  e  7]f,  e  S  (19) 

Js 
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(see  [11]).  It  follows  from  the  definition  that  for  each  fixed  p  G  S 

js  ( Kn(6  ■  *>))2  =  -Ml)  =  dim7i  =  ("  +  ][  ~ :)  =  V  (2°) 

Further,  let  V(x)  be  a  single  variate  function,  V  G  with  the 

single-variate  Chebyshev  Fourier  representation 

v(x)  f:  Vnun{x),  Vn  =  f1  V(x)un(x)w(x)  dx,n  =  0, 1, ... , 

»  o  J~l 

and  let  p  G  S.  For  each  fixed  x,  un(x-0)  as  a  function  of  9  G  S  is  a  spherical 
polynomial  of  the  class  and  it  follows  from  (19)  that 

V(x  •  p)  £=S)  J2  ’¥>)=/  f  S  K^Kn(9  ■  p)un(x  ■  0)\  n{dO) . 

n=0  75  Vn=0  A»  J 

(21) 

Comparing  (21)  and  (16),  we  see  that  Chebyshev  momenta  of  a  planar  wave 
function  V(x  •  p)  are  multiples  of  shifted  Dirichlet  kernels: 

an[V(x-p),0]  =  ^Kn(0-p).  (22) 

Now  fix  a  wave  vector  p  G  S  and  consider  the  Chebyshev  Fourier 
expansion  of  the  optimal  profile  W{x)  =  £[f;p,x]  in  the  direction  p,  cf. 
(9).  We  have 

S[f]<p,x]  Cw=  }  £n[f,p]un(x), 

n— 0 

£n[f,  v]  =  J  1  £[f’,  <P,  x]un(x)w(x)  dx  =  J  ^  R[f ;  p,  x]un(x)  dx 

=  [  /(x)n„(x-  p)  n(dx)  =  an[f,p],  n  =  0,1,...,  (23) 

and  it  follows  from  (22)  that  the  momenta  of  the  corresponding  planar  wave 
£[f;  p,  x  •  p]  and  the  difference  /^(x)  =  /(x)  —  £ [/;  p,  x  •  p]  are  given  by 

[£[f;p,x-  p],0]  =  —Kn{0-p)\ 

o„[/(1),ej  =  «„[/,»]  -  •  ?)•  (24) 
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Thus,  applying  (19),  (20)  and  (24)  we  see  that 

K[/(1>1III  =  IK[/]§-  K1(’y|12.  (25) 

A n 

Analogously,  a  typical  step  of  the  polynomial  profile-greedy  process  (14)  con¬ 
sists  in  the  best  polynomial  approximation  minpe-piv,i  ||£[/;<£>]  —  p||„,.  The 
partial  sum  QN[f]cp,x ]  =  J2n= o  an[f,  ‘p]un(x)  of  the  expansion  of  £[f;(p,x] 
provides  the  minimizing  polynomial  in  the  latter  problem. 

Let  i?jv(x)  :=  /(x)  —  Qjv[/;  Vbx  •  ¥*]■  Then  the  relations  (24),  (25)  are 
modified  as  follows: 

an[RN,0]  ~  an [f ,  0]  -  Xn{ji) —Kn(6  ■  (p); 

A n 

IKM1  =  l|a„[/]||;  -  x»WKt(’yl|!  (26) 

An 

where  Xjv(w)  =  1  for  n  <  N  and  xn( n)  —  0  for  n  >  N.  Relations  (25)  and 
(26)  complete  the  proof  of  the  lemma,  and  theorem  3  also  follows  in  view  of 

<17)- 

The  Radon  compass  (10),  (13)  can  be  rewritten  in  terms  of  the  Chebyshev 
momenta,  cf.  (23): 

OO 

cun,o]  =  \\mfN,o]\\i  =  y  kunM2, 

n— 0 
N 

Cn[Rn,Q]  —  ^2  lan  [Rn,  ^]|2  •  (27) 

n=0 

A  useful  property  in  applications  is  the  uniform  continuity  of  the  se¬ 
quences  {C[f at]}  ,  (C,y [Rn] }  for  /  G  C2(B).  This  property  is  a  corollary  of 
the  shrinkage,  cf.  lemma  2.  For  a  function  B{0)  continuous  on  the  sphere 
<S,  and  M  >0,  let  Em  [C,  £°°(<S)]  denote  the  value  of  best  approximation  of 
B  by  spherical  polynomials  of  degree  <  M  in  the  uniform  metric  on  S: 

Em  IB,  £“(5)]  :=  min  max  | B(0)  -  T(0)  |. 

deg  T<M 

Lemma  3  The  following  inequalities  are  true 

max  (E2m  [C[/jv],  ^°°(<S)]  ,  E2m  [Cn[Rn],  £°°(S)])  <  (-EW[/,  £2(B)])  . 
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Indeed,  let  (cf.  (27)) 

M  min(M,7V) 

Tm,n{0)  '■=  \an[fN,6]\2,  Tm,n{0)  '•=  \an[RN,0]\2- 

77—0  77—0 

Since  an[f]  G  7),1,  we  have  TM)Ar,  tm,jv  G  7^-  Further,  ||a,  £°°(<S)||2  < 
An 1 1 a,  £2(<S)||2  for  each  polynomial  a(0)  G  7^,  cf.  (25).  Consequently,  ac¬ 
cording  to  (17) 

max  (j|a„  [fN] ,  £°°(<S)||2  ,  ||a„  [ifo] ,  £°°(<S)||2)  <  An||an[/]|||, 

OO  2 

I|F[/at]  -  7m, AT,  £°°(<S)||  <  ^n||Un[/]||s  =  (F?m[/,  £2 (£>)])  , 

77=M+1 

and  exactly  by  the  same  reason 

||Cjv[-Rjv]  —  tm,V) -C°°(<S) ||  <  (Em[/,  £2(£>)]  )  . 

This  completes  the  proof  of  the  lemma. 

Now  we  turn  to  estmates  of  errors  in  gridge  processes(l),  (4).  The  maxima 
of  C  can  be  obviously  estimated  from  below  by  the  averages  over  S: 

p  OO 

ma xC[/,0]>  /  C[f,0\  n{d0)  =  Y^  \\an[f]\\2s, 

0  JS  n= 0 

N 

uiaxC(iV)  [f,0]  >  II an  [-Rjvjlls  •  (28) 

0  n= 0 

Thus,  by  (11),  (13)  and  Parceval’s  identity  (16),  we  obtain  the  following 
recursive  estimates  of  errors  in  (1),  (4)  via  Chebyshev  momenta  (note  that 
A0  =  1): 

OO 

H/w+lll2  <Y2(K~  1)  ||«n  [fN]\\s  , 

77—1 

N  oo 

ll^iv||2<E(An-l)IK[JRiv]|||+  X]  An  ||an  [i^jvjlll  •  (29) 

77—1  77=TV+1 

Further,  by  (17)  we  have 

A„ll«„[/]ll5  =  (S„-i[/])2-(«„[/])2, 
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so  that  applying  Abel’s  transformation  we  can  rewrite  the  estimates  (29)  in 
terms  of  best  polynomial  approximations: 

OO 

||/tv+i||2  <  E  A£n  (En  [/at])2  , 

n— 0 
N—l 

1 1 Rn 1 1 2  <  E  A^n  (En  [A?iv])2  +  £at  ( En  [.Rat])2  (30) 

n— 0 

where  £„  1/Xn,  A£„  :=  £„  —  £n+1.  Now  we  make  use  of  the  polynomial 

shrinkage.  According  to  Theorem  3,  we  can  estimate  the  best  polynomial 
approximations  in  (30)  as  follows: 

En  [/at]  <  mindl/jvlt  ,En[f]) ,  En  [i?^]  <  min  (\\RN\\ ,  En[f})  . 

Consequently,  the  following  iterative  estimates  are  true 

K+1  [/])  <  E  A{„mm  (CR%{f])2 , EHf])  , 

n— 0 

(n+il/l)2  <  E  min  ((R^[/l)2,  £?[/]) +^(Bw[/l)3.  (31) 

n— 0 

0.4  Recursive  truncations  and  difference  equations 

Let  e(£)  =  £n  :=  (En[f])2 ,  £  E  (£„+i,£„],  n  =  0,1,...,  where  as  above, 

£ „  =  1/An.  Obviously,  e(£)  is  a  non-decreasing  step  function  on  (0, 1],  e(£)  — > 

o,  £  — >  o. 

Consider  the  sequences  {aw},  {&at}  defined  by  the  integrals  of  successively 
truncated  e: 


a0  =  b0  :=  £0;  ajv+i  =  f  min  (e(£),  aN)  d£, 
&w+i  =  £ive(£iv)  +  /  min  (e(£), 6jv)  d£. 

It  follows  from  (31)  that 

ng»[f]<Jb^,  n  =  o,i,..., 


(32) 


(33) 
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so  that  to  finish  the  proofs  of  Theorems  1  and  2,  it  suffices  to  establish  the 
appropriate  upper  estimates  of  the  numbers  a at,  fi/v,  defined  by  the  recursive 
truncations  (32). 

An  estimate  sufficient  for  this  purpose  is  provided  by  the  next  statement. 

Lemma  4  Let 

H(0  :=  j  sup  In  ,  £  G  (0,1]  ; 

?  £vn) 

S(z)  inf{£  G  (0, 1)  :  #(£)  <  z},  z  >  0. 

Then  the  following  estimates  hold  for  ajv,  defined  by  (32) 

aN  <  2e  ^<5  ^— ))  ;  bN  <  2  +  e  (s  ,  N  =  0, 1, ...  .  (34) 

Indeed,  denote  m(y)  meas{£  G  [0,1)  :  £(£)  <  y},  y  G  [0,  eo) ,  the 
distribution  function  for  e(£).  Then  m(y)  —  £„+i,  y  G  [£n+i,  £„),  n  — 
0,1,...  and 

f£(0+/  min  (e(rj),a)  dr\  =  a  —  f  m(y)dy,  e(£)  <  a, 

J£  Je(£) 

which  easily  follows  by  consideration  of  the  corresponding  areas. 

Let  M(y)  ff  m(z )  dz.  If  we  take  £  —  0,  a  —  a n,  or,  respectively, 

£  =  £„,  a  —  we  see  that  {ajv},  {&at}  in  (32)  coincide  with  the  solutions  of 
the  non-linear  difference  equations 

ojv  —  &at+i  =  Af(ajv),  &jv  —  6jv+ i  =  Af(6jv)  —  M(en),  N  —  0, 1, . . .  (35) 


Since  M (y)  is  an  increasing  function,  we  have  M (y)  <  M(ak),  y  G  (u/t+i,  a*). 
Thus, 


fa°  dy  _  ,  rak  dy  ^  ak  ak. |_i 

a-N  M(v) 

k= 0  M(y)  j^0  M(ak) 


(36) 


We  have  M(y)  >  /f  m(z)  dz  >  | m  because  m{y) 

ing  function.  Consequently,  it  follows  from  (36)  that 


is  also  an  increas¬ 


ed/ 

ym(y) 


1  ra°  dy  1  ra°  dy  N 

2  JaN  j  2  JaN  M(y )  2 


(37) 
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Further,  let 


m:=r 


G?(lne(?7)) 


£e(o,i]\LU«- 


T  V  0 

The  function  h(£)  is  decreasing,  piece-wise  constant,  and  h(£)  — )■  oo,  £  — »  0+. 
Moreover, 


m = £  ££  *<«  * "«>•  (38) 
Indeed,  for  £n+i  <  £  <  £n  we  have 

r£ 0  dy  =  p  dy  =  ^  ^  =  r1  d^nejrj)) 

Js(z)  ym(y) 

j= 0  ^£j+ 1  vm(y)  £j+ 1  £j+i  -4  v 

and  the  inequality  h(£)  <  i7(£)  easily  follows  by  subdivision  of  the  domain 
7/  >  £  into  intervals  [£2fc,  £2fc+1  j  ,  fc  =  0, 1, . . . 

(37)  and  (38)  imply: 


h 


H 


q>n  ^  2s 


(39) 


This  completes  the  proof  of  the  estimate  (34)  for  ajy. 

To  estimate  {6jv},  we  need  to  somewhat  modify  the  above  arguments. 
Let 


An  :=  {k  :  0  <  k  <  N,  bk  <  2ek},  BN  :=  {k  :  0  <  k  <  N,  bk  >  2ek}, 


and  let  u  be  an  integer,  0  <  u  <  N.  Then  either  a)  cardZLv  >  v,  or 
b)  card^Tv  >  N  —  v.  In  the  latter  case,  b jv  <  2 £n-u,  simply  in  view  of 
monotonicity  of  the  sequences  {&&},  {£&}. 

In  the  case  a)  for  k  e  BN 


h  ~  bk+ 1  — 


1  fbk 

m(y)  dy  >  -  m(y)  dy 
2j  J  o 


Mjh) 

2 


so  that  the  estimates  (37),  (39)  are  modified  as  follows: 


rb°  dy  fbk  dy  ^  ^-\  bk  bk+ 1 

JbN  M(y)  ^  Jbk+1  M(y)  ~  k^N  M{bk) 


cards  at  v 

> - -  >  -  , 

2  “  2  ’ 


bjy  <  2  £ 
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Summarizing,  we  see  that 

b"  -  20?Sv  (£"--  +  £('S(i)))'  hN-2(£T+E(S(j)))  ' 

and  the  estimates  (34)  for  b jv  follow. 

Now  recall  the  definition  of  the  function  e(£)  and  that 

.  _  1  _  (d-1)!  (d  —  1)! 

An  (n  +  1)  •  •  •  (ra  +  d  -  1)  ~  ’  n  °°' 

It  is  easy  to  see  that  the  condition  En[f  ]  —  O  (E2n[f  ])  implies  that  e(2£)  = 
0(s(£)),  and  further,  H(f)  —  0(  1/0,  £  “ * ►  0;  6(v)  —  0(  1/v),  v  — »  oo. 
The  latter  estimate,  (33)  and  (34)  complete  the  proofs  of  theorems  1  and  2, 

because  e(0(l/N))  =  0(e(l/N))  =  C>(e2  _jV  N  ^  oo. 

V  N  d-l  J 

Remark  1.  Obviously,  we  can  reformulate  theorems  in  terms  of  majorizing 
sequences,  instead  of  exact  values  of  best  approximations. 

If  a  sequence  of  of  positive  numbers  {£n}o°  rnonotonically  tends  to  0  for 
n  — >  oo  and  satisfies  the  condition  £n  —  0(s 2n),  n  — >  oo  and  if  the  esti¬ 
mate  En[f]  —  0(en),  n  — >  oo  holds  for  best  polynomial  approximations  of  a 
function  f  G  C2(B),  then  the  errors  of  gridge  processes  (1),  (f)  satisfy  the 
estimates 

KS[/]=o(£jvA).  nP[/]=o(VA),  N  ^  co. 

Remark  2.  Let  us  consider  briefly  also  some  classes  of  functions  with 
very  fast  decreasing  polynomial  approximations,  say,  satisfying  the  E„  [/]  = 
0(e_n“),  where  a  is  a  positive  number.  In  such  case  we  have  e(£)  ~ 

exp  cf-P^J  ,  /3  —  f  — >  0  and  consequently 

=  O  ^  ^  0;  5(v)  =  O  (V^)  ,  v  ->•  oo. 

Therefore,  by  (33)  and  (34) 

EM]  =  O  (e-"“)  =►  +  R£[/]  =  O  (e-clJvl)  ,  7  =  a+°_t 

where  c,  ci  are  positive  constants  depending  only  on  a,  d. 

In  particular,  for  functions  of  two  variables  in  the  disc  B 2 

EM]  =  O  (e-“)  =*■  n%[f]  +  Kf[f]  =  O  (e-^)  . 
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